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xFOItEWORD 



Improvement, in, the efficiency and economy of individual 
enlisted training, evaluat ion» and utilization is essential to ^ 
maintain maximum combat readiness of the Army, and is a major 
concern of the Individual Training & Skill Evaluation Technical 
Area of the Army Rese^arch Ins titute for the Behavioral and Social 
Sciences (ARI). The pres-ent Army. policy emphasizes performance- 

\>^ffsed training and testing; ARI research has made possible the 
development, validation, and application of performance-based/ 

* criterion-referenced Skill Qualification^ Tests (SQTs) as yell as 
self-contained procedures byVhich A^my/fest Development Agencies 
can construct and validate the SQTs. 

• ■'. ' ~ ■ . ' 

The present report discusses the SQT program, its principles 
of teat construction, and the benefits expected in its utiliza- ^ 
tion. Research was accomplished under Army Project 2Q762722A76A, 
and is directly responsive to the requirements of the Individual 
Training and Eval^ua tion Directorate (ITED) of the Army TSiairrlng 
Support Center, Fort Eustis , -Virgin^ia . ^ 




J. E. UHLANER. 
Technical Director 



CRITERION-REFEF^ENCED Jp 
SCALE APPLICATION 



OFICIENCYtESTING: A LARGE 



BRIEF 



Requirement : ' ^ 

Army training and personnel management^ requires job per- 
formance tests that are fair to all soldiers, feasible for 
worldwidfe administration, and measure performance on critical- 
job taslcs. 



Procedure: 



Procf&dures for developing Skill Qualification Te6ts (SQTs) 
were prepared and tried out by test development agencies. 

The procedures cover assuring, thatN the teats have content valid- 
ity and verifying that the tests are Accurate measures of 
performance. \ ^ . ^ * 



(N Results: * 



Procedures ,for developing crltet Ion-referenced, 
ancie-based evaluations of task performance. 



perform- 



Procedures fot determining accura 
•measures of performance. 




the tests as 



Guidelines and self-Instructional materials far develpping 
^SQTa. , The procedures are designed to assure that the ^ 
testg are based on realistic job requirements a^d that the 
scdres reflect successful tasl^ perfoirmance (that is,o they 
are criterion referenced) . The general test content, 
therefore, can b'e open knowledge, and subsequent nandg- 
ement^declslonm^klng can be k^sed o^^ow wfell:. soldiers . 
attaln^'performance standards. ^ . ^ 




tlllzatloh: 



The prV:edures for constructing, and validating Skill Quallfl-: 
cajfelon Tests are In use for developing more than 1000 tests for 
e^valuatlng j ob prof Iclency In the Army enlisted force. ^ The 
guidelines and self-instructional materials are used to train* 
personnel a^t the more^ than tljlrty Test Development Agencies on 
how to develop SQTs. ' ^ 



:::CRiTER40N-REFE«ENeED^-J0B PFK>FiCreNCY TESTfNG: A L^^^ 
'APPLICATION ♦ . ' 
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CRITERION-REFERENCED J06 PROFICIENCY TiSTING: A LA'RGE SCALE 
APPLICATION 



OVERVIEW i . 

Skill Qualification Tests (SQT) have been developed to replace 
Military Occupational Specialty (MOS) proficiency tests as measures of 
ability to perform Army enlisted jobs* SQTs are performance-based, crl-, 
ter Ion-referenced measures of job proficiency, consisting of precisely 
defined tests of tasks, all of which are critical and necessary to 
performance of the job. The criterion-referenced approach provides an 
explicit relationship between job requiremients and test content in that 
Job requirements dictate content of SQTs. The SQT development process 
requires that t«sts be reviewed by subject matter experts and validated 
on representative job Incumbents to assure that test content is job 
relevant* Test standards of acceptable levels of performance are also 
based oh job requirements and test content- Performance standards are 
based on behaviorally derlve'd absolute scoring standards, and are not 
based on performance relative to other soldiers who take th^ test. For 
these reasons SQTs are justifiably viewed as criterion-referenced tests 
of job proficiency. 

A criterion-referenced testing system offers two significant adv^an- 
tages not available in traditional testing programs. One is that test 
content can be made public in advance of administration. There are no 
reasons .to keep test content secret in a testing program based on 
explicit linkages between test content and job requirements. ^ Advance 
knowledge 6f test content results in an equitable and open system. 
Everyone has an equal opportunity ,to acquire proficiency on the specific ' 
job tasks known to be included in the test. 



The second is that a criterion-referenced approach allows personneJ 
management: decisions puch as promotion, selection, and advanced school- 
ing to be based on performance standards Instead of personnel quotas* 
In more complicated situations involving the merging or splitting of ^ job 
specialties at higher skill levels, soldiers from different specialties 
can be compared on the basis off their levels of competence instead of 
their relative standing in the testing group. CrltiBrion-referenced 
testing of job profjlciency has opened ^ew opportunities .for both training 
and personnel management. ^ ^ ^ 

BACKGROUND 

. The Army has been using tests to measure job proficiency for over 
15 years. These tests, called Military Occupational Specialty (MOS)v 
proficiency . tests, were designed prlnlarlly to help personnel mana^rs in 
making decisions of vitaJ^img^tance to individuals' careers,. suciT^^ 
proficiency pay^ promotion, an<j assignments. The MQS tests were tradi-*- 
tioi^al achievem^t tests, cpnsl^tlng of- 125 multiple choice items, each; 
with four alternatives. The test content was related generally to the 



■ ■ .\ ■ ' ■ ' ' . ^ ^ 

domain of jo6\i)erformance, but there was* no definitive logical correspon- 
dence between test Items and specific job requlremetits. Each Item was 
scored\pass-fall.j the total score was the number of Items correct, and 
the totalsScore wa.s then used to rank order persons In each job specialty. 
Therefore^ aiiy referencing of test score to test eonfent was Immediately 
abandoned. _ 

tWhlle ssuch proficiency tests have use In personnel management deci- 
sions, they did not /fully, serve the Army training needs. Because of 
content limit at Ipnd, lack of content-scoi^e correspondence, minimal diag- 
nostic utility, and the long.delay in providing feedback to the field 
. (up to Qne year after testing). Army trainers did not find MOS tests 
particularly useful for determining training requirements, measuring 
individual and "unit performance, and defining training readiness. 

Army, training during this same period, especially in the late 1960's 
and early 1970'-8, was undergoing a major revolution. Performance-based 
training and testing, based on critical, job tasks and criterion-refer- 
enced standards of performance, werfe being Implemented in entry-level 
training courses. Training objectives were operationally defined by the 
performance tests given during the course, and tests were made public to 
studenta as well as instructors. The content of these tests was always 
directly fSlevanf^to the job. The tests themselves were used to drive 
the direction of training. - " . . 

'Tjests, because of their function in maintaining accountability, 
aire effective instruments in bringing about ' institutional change. Test 
content helps Implement tioctrlne about the way jobs are to te performed, 
and is helpful ^In defining training requirements . and standards. The pub- 
lic nature, of the tests helps focus attention on the critical elements of 
the^job, enable$ effective use of soldiers' time in preparing for tests, 
and thus improves Individual readiness, v 

So impressive was the success- 6f performance-based training and 
testing that the Army made the policy decision to change from the exist- 
ing mode of job pr^iflclency testing, typically referr^d^o as "ndrm- , 
referenced, paper-and-pencil testing," to the crlte^on-ref erenced mode 
of\ prof Iclency testing. These new crlterlon-ref ereWed teats, called 
Skril Qualification Tests (SQT), are having a prof oun3\ Impact on the 
ent^F^ Army community* The nev^ testing procedures ar©^ forcing training 
jers, personnel managers, andNresearch suppob^^^^S^rsonnel to rethink 
and iaften redefine their functions*. . f " 




QUIREMENTS^OR SKILL QUALIFICATIONS TESTS 

The basic requirement of SQTs is that the tests are job relevant. 
The test content must be based on job' requirlsments,, and the test scores 
must be accurate measures of ability to perform critical job tasks. 



V 



V .^ ' The Job jjfeleva^ of SQTs. is assured by basing, theia on Sdldler s; . ■ 

Manimis. : Soi;^^ ^he critlcaL job tasks, the behaviors 

required to .{fei^fprm the tasks, the job conditions, and the standards ^ • ... 
of >erformAncev Soldier's Manuals define the'jobs in that they'list ail 
the .tasks soldiers in a job specialty are responsible for performing. * ^ ,v 
Since SQTs are* based 'bn Soldier's Manuals, the SQ^s are job relevant. 

. • ^ . • . . ^» . • ■• ■ . , # - . • - 

^ ■ • • ■ • . r 

PERFORMANCE INFORMATION FOR TRAINING AND PERSONNEL MANAGEMENT 

■ . , ' - 

SQTs are used by both training and personnel managBment to help make • 
Important decisixDns affecttog^ the career develbpment of soldiers. Both ^ 
^^ttalntng and per^nnel J&m need tiraejly ajicf accurate inf privation*- * 

about how wellj^ iiii^fVid'ual^ management .^ta deter- ^ 

' -mliie training ife4i|l||3ainentS'rf^ persotjnel managei^ent to ^ ^ 

. help determiiie wh^^^ ;pj^^ Mt)iougjJ >oth * V ^ 

training dnd perteprihel mail a'nefed f or -the ^same kind of « 

Information*, 'their iimnediate requirements are not identical* ' . * * 

Training managers base their immediate training requirements on the 
specific 'tasks petformed In their units.. The- job relevance of tests for 
*• specific assignments, therefore, is the primary consideratd.on-froin this 
point of view and it is^.defined. In terms of;the tasks that soldi^^r's. per- 
form in their assignments. Th^ aet of tasks ,|)etform1sd in an assignment ^ 
is generally a^ subset of. the tasks tequ^Lred \ln a spetiialty. The task is 
a .corivfetilent unit, fon determining training requir^jpents because tasks are 
observable, have Initiating and terminating cues, and have', standards of ^ 
* performance • that, can be reasonably well specified. Decisions aboutl i ^ 
prof iclency can be made at the task level , /and training 'managers can\_^ 

* identify* the specific tasks on which soldiers need training. If the ^ 
test measures performance on the"* specific tasks f5r which the training 2 

, -lU^nagers have responsibility,- then the tests are serving their basic 

purpose. ^ - \ / . ; 

> .... ■ ^ ■ 

Personnel manager^tare also concerned with the job performance of 
individual soldiers; but. rather than focusing on soldiers' specific . ^ 
assignments, personnel'managers need ta know how we^l soldiers can 
perform all the tasks in a specialty. For example, performaifce in a 
specialty, such as Infantryman or Wheeled Vehicle Mechanic, cannot 
necessarily be inferred from the set of tasks found •in any one assign- 

• ment. PersQnriel managers,' therefore » have a need ^or^^-^Jlf ormation based , .^a ■ ■ 
- on a sta\idard set' of, tasks -for each Specialty. All /oldiers in, a, 

. specialty need ito be evaluated on the ^an^e set of tasks to enable fair ^ ' \ 
decisions-'about which soldiers to promote, retain, or reclassify. Th€^ ^ 
need for a standard set of tasks in* each specialty Imposes addltiphal 0 ^ 

testing ^requirements' for feasibility and acceptability. The test scores 
sTftmld not be affected by when or where the. test is taken, nor by whom 
it is administered and scored. The testing conditions, as well as/ 
performance standards, ^should be standardized. ^ 
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JOB RELEVANQE . , 

The Vequirement 'fp s tam^ardizatlon at the present ^tate of.^ 

the art in testing means'-that initially inc?st of tlie test content, is in . 
the^ paper-andr-pencil mode rather than hands-on perforina[nce tests^ Paper 

'.and pencif tests , generally lack the apparent jot relev.ance of hands-on 
perftormance tests, and therefore an additional requirement^is imposed to- 
assure tfiat^ t^e tests, are acceptabj^e to examinees , supervi^trr»-*and com-. 

/n^anders as valid me^aures of job proficiency^^ / - 

/ Job felevance pf the tests is the basic requirement for bq^th tra^inin^ 
apd personnel management , even though the dLpf inition of Jc^b relevance 

^may have/ somewhat different meanings for the two purposes. For training. 

^purposes' the focus is gn the sublet of task^ performed in the^ specific 
job ass^gnm^erit \ihereaV for personnel purposes the interest is the 
entire set of tarsks in the specialty. 

The SQTs are designed to, serve the/requiremen^s^of training and 
.personnel management. Because of the ir somewhat divergent immediate 
pqeds, eristic a 1 issues arise in how SQTs are deve loped , ^c*ored , and 
used. These issues — notably the public nature of test qbnt^nt *nd 
'personnel quotas as performance standards -rare treated in tl;iis paper. 

The next section describes the development of Skill Qualification 
Tests "and expands on the technical requirements, managerial requirements, 
and practiq,al constraints described in this section.' The subsequent ' 
' sections describe assumptfions in scorifig SQT and benefits resulting from 
adoptinjg a cri ter ion-re ferenced 'approach to SQT development . Th'e 
magnitude of these- benefits far outweighs the costs of developing- and 
impleinenting such a large-scale program. . v 

DEVELOPMENT OF SKILL QUALIFICATION TESTS 

the Skill Qualification Testing (SQT) program^is a large scale at- 
tempt to. provide v^id atid efficient measures of job proficiency^ This 
sdftion describes thV process of developing an SQT, which assures that 
the tests are fair, feasible and acceptable. Because of Ahe strategic 
importance of Skill Qual if icat ion^Tes ts to both training' ahd personnel 
raanag««ent, high level policy decisions were made about test content j 
validation, and scoring. ^e general requirements of the program are 
th?t tests must be (a) fair, and feasible atrd, (b) haye validity demon-, 
strat'ed in advance of operationaf* use . ^ . . . 

"i^IRNESS .'ANb .FEASIBILITY OF THE TESTS 

• V Fa^ir^*ess -means tl]Vt ali^ soldiers have an .equal Opportunity Jt^ demon- 
' ktrateVtheier true level of job competence. Test cojitent must be based 
on aptua^r ^ob requirements, and. tjBsting conditions must be sufficiently'" 
constant thtoughout- the Army so that scores obtained from administrations* 
under varied cc^nditibns are not noticeably different . Tests given in 



Alaska , Panama , and Korea must all be administered under similar^ condi^- ^ 
tionSj and, in additlpn, all persons Administering and scoring the tests 

• must be able to #do so accurately and objectively/ An- additional require- 
ment is that the test's must be a^icceptable to soldiers and knowledgeable 
experts as fair measures of ability to perform crit";lcal job t=asks. There- 
fore, fairness attends to requirements of both training and personnel 

^management.. * ' ^ ^ 

/ .Feasibility requires that the, tests be suitable for administration i in 
all types; of units ; equipn«&nt , terrain, pei^soTiTiel , and all testing 

. material must be readily available* Another aspect of feasibility is-^ 
that testing time must be" reasonable , with up to one day allowed for. 
testing each soldier. 

The requirement sj^l^t Skill Qualification Tests be fair apd feasible • 
. put severe limitations on- the use of 'hands-on performance tests. The 
history of performance testing is that scoring accuracy and standardi- 
zation are difficult to obtain. The resolut ion i6f the fairness and 
feasibility requirements is to have several kinds of testing. Under 
present rppiicy decisions, all Skill Qualification Tests contain a written 
componerfti and some Skill Qualification Tersts contain a hanjds-pn componjent 
Four hourfes^fi testing is allowed for the written component, and up to 
four hours ^"Ss^llowed for tJar^ hand s-^on portion. A third component, 
called performan ce cerX iHrfyatign , can also be included in Skill Qualifi- 
cation Tes-ts. It is essentially an observational evaluation of actual 
Job performance. ' 

Therefore, an SQT may include up to three distinct types of tests, 
each with its own inherent strengths and weaknesses. A combination of ^ 
these tests is the operational answer to the fairness and feasibility 
requirement. ' 

Types of Tests. Hands-on performance tests are most desirable. 
Th^y are a form of structured observation where a scorer evaluates, an 
individual on a set of performance measures (obs.ervable behaviors). 
Advantages of hands-on testing are obvious: it tests actual performance, 
has high fidelity to the job, allows for immediate feedback, and has 
y high face^ validity to examinee^. However, considerable developmental 
effort is required to' insure scoring reliability and standardization of 
conditions.'! It also is expensive in terms of equipment, personnel, and 
time, i.e., feasibility is of ten -a p^blem. In order to ensure feasibil- 

H*ty there is a natural tendency to truncate tests of tasks by shrinking 
' the boundaries. Unfortunately," this may be at the. expense of the 
validity of the test. For these reasons it is i extremely difficult, if 
not impractical, to initiate a large-scale hands-on testing system for ah 

•drganization as large gs the Army. Therefore, a hands-on component 
constitutes a subset of an SQT. _ ^ " 

\An alternative form of hands-on testing is per formance certification. 
Th€^erformance certification component covers tasks that ^re too long * 
and^r complex to include in the hands-on component, and that do not 
lend tfiemselves to testing in a written mode. Performance cer tif ication \ 
tests are to be administered and scored by soldiers" supervisors in the 



normal. Jt>b setting. Performance certification allows greatqri flexlbl 
and avoids some of thq^ feasibility 'problems encountered In a Ijands-on . 
component The greatest problems In performance qerti^lcatlon are Iti- 
^urtiag reasdhable standardization of* job testing conditions across '/ 
individuals and standardization of scoring by supervisors. . Until sound 
methods are developed for addressing these problems, performance certl- * 
flcatlon will remain a small portion of an SQT. 

The decision CO include a written component Injposes ^car^ful consider- 
ation and analysis of what criterion-referenced measurement mie^ns in this 
context. Since the foc^^of Skill Qualification Tests Is on/^illty to 
perform critical Job tasks, that 'aspect must be retained-. EachV written^ 
^test . of a task is to consist of a set of Items, where each item' Is de- 
signed CQ measure an essential behavior or step in performing the tasfe. 
For tasks that require primarily mental skills, such as thos& in Supply 
and administration, written tests of tasks are often similar to " 
behaviors required on the job, arid the standards for^ ability to -^^form . 
. the "test of the task can ^be reasonably close to those on the job. For 
other tasks that require psychomotor skills, writteri test items only 
simulate actual JolfTjehavlors, and the sifting of realistic standards 
^^^ndlcatlng ability to perform the task is a more arbitrary process. To 
^elji approximate realistic job conditions, written items may have multi- 
• pie correct responses and a variable number of alternatives. This iadded 
. flexibility Increases the difficulty in developing appropriate methods 
for setting standards. The determination of reasonable standards for 



written tests of tasks is one of the most, difficult Issues in the SQT 
program. \ ' • ' 



A 



Scoring the Tests. Because Army jobs and trainii^g programs are 
structured In terms of critical tasks , the appropriate level of scotlng 
for the SQT should* also be based oij tasks.. The concept of "scorable 
unit" was inver^ted to help assure criterion-referenced measurement of 

A scorable unit Is designed to measure ability. to 
task, or In the case of complex tasks, a well defined 



task performance, 
-perform a Specific 
subtask. ^ 

Each written scorable iinlt consists of a set of Itemfe, 
item is designed to measure an essential behavior or steri^ 
the task. Each l;!^em Is scored pass-fall, and a presrcrlbec 



, where each 
in performing 
tjber of 

items must be pass^ed to^be GO on the written scorable unit . A\GO Is . 
counted as ability to. perform the task. The currerft resol-utlonN^. set- 
ting standards for written scorable units is fco require that an a^^riori 
number of items be passed. For. example, if a scorable unit conta:ltiS 
five items, then four must be passed = to obtain , a GO. 

Hands-on and performance cert if I'cation scorable units consist of a 
set of performance measures , where each performance, measure is scored 
pass-fail, and a prescribed number of p#^ormance measures must be passed 
to be GO on the scorable unit . A GO on the scorable unit 13^ interpreted 
as ability to perform the task. The standards of GO generally are 
comparable to what is required on the job. , ^ 
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? The requirement thrat all scorabl^e unit>/ be acceptable as fair meas- 
ures* 6f ability . td ^peirf orm tasks, is appLi^d to" both the hands-on- and 
written tests. Juries of experts, must/agree, that the wr^^ 
hands-on peirformatiice measures'^ to perf orm the tasks. - Per 

haps d safer statement would bfe th^ failure t6 pass the itiems indicates 
that the person is not able to p^form the task* „ / 

The most critical requirepient of SQTs is t^efr job relevance. |he 
procedures for establishing^ job televahce are described in the following 

sectioij. '■,./. / 



:.ESTABLISHING A CORRES/ONDENCE BETWEEN T^EST CONTENT. AND JOB TASKS . 

Test content^ of^^l SPTs is a sample of critical tasks from ^the 
domairt of job . tasks in th^ / special ty. In this way the tests . have a - 
specifiable and/explioit link to the jol>.^ For each Army job tliere 
^^i^ts a Soldier's Manual that lists the tasks for which'^a • soldier in 
th'atyspecialt^y is responsibl^e. Therefore, this set of tasks becomes the 
operAtional.f definition of the' job. ^ Tests to measure performance osd 
spetific job tasks listed ±*i t^e Soldier/s Manual are developed from . 
appropriate task analyses; and , the tests for ^each task are operational 
definitions of performancfte on' "the tasks. Performance on the individual 
tasks i^ eiimmed to obtain a total score, which in turn serves as the 
operatibilal definition of job competence. ^ Modern instructional, technol- 
ogy, with its emphasis on specification of objectives and ' verf icatibn 
that those objectives are attained, supports the above proces^for 
Establishing the ^content and focus of SQTs, and thereby lends added 
credibility 'tQ: these procedures. 

Though the task is the basic level of analysis, the JVali44ty of t^ask 
proficiency measurement depends on the adequacy of the test of the task. 
By means of detailed, task analyses, the set of performance measures or 
behaviors required for successf ul^perf orfaance of the task ^re identif^.ed 
These li'ists of performance measures are all available in the Soldier's 
Manual. Each item developed to test for task proficiency mu€t occupy a 
clearly specified relationship to a performance measure required in 
task performance. Assuming" that the set of items developed for a test 
of^..a^ask has been selected in accordance with the procedures described 
above, one may assume with reasonably high confidence that successful 
performance of each tested behavior is a necessary condition for success 
f^ul performance of the task. How to score the set of items in a written 
Vcorable unit to obtain estimates of ability to perform tasks i$ a 
complex question.^ M^surement error is ali&'ays a problem that . inust be 
alloj^ed for. l^ethar b^^ on a test of . a task requires ^ 

passing all items incJuoed in the test or something less than perfection 
depends on the nature\>F the task, the fidelity with' which the task can 
be tested in a writ ten rkode , the comfilexity of the format (e.g., multi- 
ple correct responses), la^ the number of vitems within the cluster. Use 
of subject matter expertsSJLn reaching such a determination is mandatory. 



■ : ;7 ./-V •■■:-*.\7 ,:/7 ■ . 

r In .the case . of a hands-on test.Nof a task, measur^meny ey^ror arising 
from^the^use of words ls•Inini^^i?ed/ However, other meas^^ problems 
arisje. On^ i« that a full per>^>J3ifanc a tas/k gehei/ally is not 

feasible. It may be too costly in terms of time, eduipdlent , or person- 
nel. Therefore, a trw^ test of the task is- often/ developed/ by 
eliminating ; some of the performance measures or steps rtequired for the 
full performance test. By truncal ine the test, thdugh,' it is possible 
that the tested portion is necessarjjTto successf u performance^ but 
is not sufficient. • / / i < 

VALIDATING TESTS ?RIOR TO ADMINISTRATION 

first question to be resolved was how to detinef validity. Tlje 
starting point was the usual def inition of validity*. I.e. 
miasiire What they are intended to irfeasure. In thte ciape of %jkill Qualifi- 
cation Tests, the intent is to measure ability tp per|fQrm prjitical job 
tasks. "Hie content -of the tests, therefore, becomes jthe ^rupial factor 
In establishing validity. The content must be thoroughly reviewed^ by ex^ 
perts to ensure that the right behaviors and decisioiis are 'assembled in 
each scorable unit. The first requirement , then, is consistent agreement 
among experts that the content of th^ test is based on ability to. perform 
critical job tasks. A second requirement is that the scorable units dis- 
criminate between performers (masters) and nonperformers (non-mast eiis ) . 
vA third requirement applies* only to written scorable units. All items 
^na written scorable unit must be consistent estimators of mastery on 
the task covered by the entire scorable unit. Thus, the' conceptualizing 
of. vall^ty focuses on consistency: consistency between the'' content of 
the test and the job tasks, consistency amopg expert reviews, and consls- 
'tenc:]^in identifyingXmastery. 



x 



Skill Qualification Tests are constructed and validated by Army 
agencies that have resident expertise in the job specialties. Generally 
these are the Army schools, but they also include other agencies, such 
as the Health^ Services Comipand, Since the tiest content muet reflect job 
tasks, the /test developers must, have detailed task analyses available that 
identify tjie behaviors essential to successful performance of the tasks. 
Skill Qijalification Tests^are developed in the following conceptiual 
sequence: 

-I, Identify taisks for testing; ^ 
$• Identify behaviors or steps easehtial for perfoi;mlng each task; 

3. Develop scorable uuits to cover essential behaviors pf the task,- 
and review scorable units for content validity; r 

4; Try out "scorable units on soldiers to verify accuracy of 
measurement. « 

After each step in the process, the products are submitted to higher 
headquarters for review and approval. The content of the scOgfable units 
is fixed after step 3. Scorable units found /to be unsatisfactory through 
tryout on soldiers can be revised, but the content cannot be changed. 
Test content Is fixed through agreement amoiJg experts that the .contents 
of the scorable units are Indeed valid measures of ability to perform the 
tasks, and the tryout serves only to establJLsh the measurement properties 
of the scorable units.* ^ 

- 8 - 



ERJC . .tC I 



The trybut with soldiers is dif lerent for the., anf written 

components. For Xhe hands-on tests, the primary concern: is to -establish; 
that the* performance measures can be/ scored accurdtely. Acceptable 
» agreement among the scores, i^ consiyered tt> be attained when 80 percent 
of all pairs of rater scores are the >same £ftr the perf ormance^measureB 
in a scorable unit . If less^ than 8g percent agreement is obtained, th^n 
thp performance measures are revised ^until, an adequate level of scojin^ 
consistency is attain^. * . 

^ For written test»-^he tryout is concernedjljith establishing the ef- 
fectiveness of scorable tinits in distinguishing between per formerg '^d 
' noHpetformers^ ahd>with assuring that all elem^ts in a scorable j^^t 
are consistent in estimating ability to pe;rform the ta'fek. This tryput 
helps assure that all items,»of a scorable unit contribute to measuring . 
performance on the task. 

Affinal ^ evaluation of ^ the' written scorable units . is conducted after 
operational administration of the tests. A representative sample. of 
answer sheets ij^ selected for analys^ , and the difficulty of .items and . 
scorable ahits are obtained Those with high difficultly are examined to 
determine if they are faulty. Faulty items andvscorablevnits are 
delated prior to iinal scoring. When all steps of the review and analy- 
sis procedure for the written scorable units are accomplished, their 
vali'dity as/fair me'asures of ability to perform' job tasks is considered 
to be reasonably .well established. 

^ ASSUMPTIONS FOR USING SQT SCORES 

' • ■ ^ ; ^ \ 

The assumptions on which SQTs are scored can" be qjj.:e^rly explicated, 
as can the operations that determine test content, sisoring pro.dedures 
•and standards. * 

In this section, three sets of assumptions made in using SQT scores 
are considered. These are using SQTs to (a) help determine training 
requirements, (b) help select soldiers in a single specialty , and (c) 
help select soldiers in merged specialties. 

HELP DETERMINE TRAINING REQUIREMENTS * 

The assumptions required for using SQTs to help determine training 
Ifequirementfs are straightforward. They are simply: (a) tasks can be 
defined — task elements or behaviors can be specified, conditions given, 
and standards of adequate performance established; (b) tasks can be '< 
measured validly — performance on the task is measured by scorable units, 
which contain time or performance measures related to task elements, and 
the sum of the elementfe passed in a scorable unit indicates quality of 
performance on the task; (c) task elements are weighted equally—items^ 
or performance measures corresponding to task elemeiits or behaviors are > 
scored as pass-fail, or as one-zero. ^ 
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These thriee'^-assuinpt Ions, servie to provld^^ opierational definitions of 
performance on the tasks^ mea^ilred in ;SQTs. Although task el«hent3 'do 
not have to-be weigtrted equally , ' research evidence indicates that di'fferr 
ential weighting generally jibes ^not iraprov^^the iquality of measurement. ^ 
A common prac-^ce is to, give "^ri element greater weight by preparing ^ 
several items or- per formang^/ measures -for it • . * * ^ 

The assumptions needed to.ti^lp determine training require 
tain only to tasks taken one at a timfe* Since ,th€i current training^ 
philosophy is to t^ain oa, dia^c^ete vt;asks., ^ib assumption about the 
interrelationships amonig the tasks is *requit^^d. . * 



■ ^ ^ "... # ■ • . , - , , 

HELP SELECT 'SOLDIERS IN lA, SINGLE SPEC lALTX. ;^^^^ " • .. 

, ■ . * ' ' : ■ ■ ' •, ' - 

" The. case 6f using S^s to help ^select sordier.s_iri a^ s^ — 
specialty requires additional ;assumpt ions about- the int^rrjerlatlon^^^ 
amon^ job tasks atid scorable utt^its that measure task perfoymancE. . lie . 
same three assutciptions ^bout measuring task pier f origan ce are requij|^ed 
(tasks can be defined, tasks can be measured -valfdl>, and task Elements 
are weighted equally)* ; ' 



In addition, 
are tJeighted equa 
scpre is the numb 
score is obtained 
(c)^ the peVcent o 
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three more assumptions are requltn^: (^ scorable units 
lly„all are scored as GO /NO-GO or as on&--»^rro ; (b) test 
er of scorable units performed correct lyr-^tjhe total 

by adding up the number of scorable units passed, and 
f scorable units passed indicates level of job perform- 
of scorable units passed corresponds to the proportion 
Idler can perform. Given these assumptions, SQTs define 
job, proficiency, and the percent of scorable units cor- 
ent-correct) is a direct reflection of job proficiency, 
proficiency can then be set in terms of percent-correct" 



HELP SELECT SOLDIERS IN MERGED SPECIALTIES 

* In the case of merged specialties, an additional assumption is 
required abbut the relationships among the jobs or groups of soldiers. j 
The first six assumptions made in the case of the single specialty result 
in criter jJon-referenced measurement for each of the jobs being merged. 
However, in order to maintain criterion-referenced standards for merged 
Specialties, the assumption is required that the T^^s being merged are 
equal — that is, equal levels of proficiency in the individual jobs are 
equal to each other in an absolute sense, or stated operationally, all 
scorabl6 units from all the relevant SQTs are weighted equally. Thus a 
soldier qualified in specialty 45N, for example, is equal to the quali- 
fied soldier in 45P, regardless of the percentage of soldiers in each 
qualified group. An implication of this assumption that the jobs being \ 
;^erged are equal is that if one qualified group contained 5 percent of a 
^first MOS population whlje a second qualified group contained 50 percent 
pf a second MOS population, the merged qualified group would contain 
proportionally more soldiers from the second group. . 
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. Iii the above- exaemple^^ MOS rwould represented In fehe merged 
qualified group In accordar^ce)' with the number of soldiers* from each MOS 
^ho attained qualifying scores. One MOS may. be proportionally over- ' 

epresented, while the^'second MOS is minlinal^ly represented or possibly 
no^ represented at all* How to use and maintain performance standards 
fof merging MOS Is a policy decision, and not a technical question^ 
pHawever, the crlte ^l on-ref erenced properties of SQTs permit rational 
policy decisions. " ^ ' 

\ An alternative^^ assumption in the case of merged specialties is that 
t'Bfe groups, and not the^ MOS, are*, equal — that is, equal percentlle~rank V 
scotfes Indicate equal levej-s of job*"prbf iclency. The use of percentile-^ 
rank scores, which indicate relative standing in a^gxo^p, facilitates 
proportional represemtation of each MOS in the merged qualified group. 
For exmple, a pali^cy decision could be made that 40 percent of each MOS 
be considered eligible for promotion.^ Such .a policy declslpn^ m^^ be - 
inade if policy makers not willlug to assume that the jpbs were 

equal, or that the SQTs were not equally valid criteriW-^ref drenced 
measures of all thq merged MOS, or if the policy makers decided thai the 
need for proportional representation of the MOS in the c^alifled group 
outweighed the need- to maintain pe'rformance standards. 'However, if SQTs 
are scored as percentile-rank and qualifications are based on percentile- . ' 
rank scores, then the job performance standards would be given little or 
no consideration in determining the qualified group. ' , 

■■ \^ ' ^ ■ ■ ■ ■ " ■ - 

■ BENEFITS FROM USING CRITERION-REFERENCED SQTs 

> The change in focus ifrom norm-referenced Military Occupational .Spe- 
cialty prof iciency tests to criterion-referenced Skill Qualificatidti Tests 
has * enabled training. and personnel management to obtain more comprdhen- ^ 
slve and meaningful information than before. Two major benefits that 
have resulted from the adoptidn of the criterion-referenced approach 
deal with ||^) public nature or test couteritj and (b) Job performance 
standards vs. personn^ quotas. These benefit s are discussed separately 
in the following paragraphs » 1 



PUBLIC NATURE OF TEST CONTENT 

An effective job prof Iclency testing program should be part of a 
. larger system^ that Includes job requirements and individual training 
programs. Modern instructional technology emphasizes the systems ap- 
pr9ach to training, and a joK proficiency testing program is an integral 
* componient ot the Army's modern training system. 

Job requirements are defined by Soldier^s Manuals, which list all the 
tasks a ^oldier in an MOS skill level (job) is responsible for 'performing. 
Soldier's Manuals are distributed throughout 'the Army for use by individ- 
ual soldiers and for developing training programis, both resident courses 
and decentralized training conducted in units. Soldier's Manuals^ are 
also used to develop SQTs. No task can be tested that is not in the Sol- 
dier's Manual. Once the system becomes fully operational, all components 
of the Army can know what each soldier shoqld be able to do, is able to 
do, and should be trained to do. There will be no surprise requirements.. 
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• IiJ addition to Soldier's Manuals, soldiers are given additional- 
detail/fed information about the job tasks on which-they will be tested. 
7hi>tth format ion is .contained in the SQT Notice, which lists the specific 
t^ks included in an SQT, how th^ tasks wi(ll be tested (written or hands- 
o.n'f» standards, and a descripision of the actual ties^t content. Soldiers 
.are given ^advance notice CKf what they will be required to know and do. . 
All soldiers in an MOS*are given equal information about what they will 
•be' tested on, "potentially allo^hg them equal opportunity to prepare for 
the test. Test content; at IpaslJv in general terms, is public knowledge. 
•■ t. ' ' ' . \ ■ ■ ■ 

"r" ■ ' 4- .-A • ■ ■ ■ ■ ■ ■ - ■ ■ 

' •nie ^public nature of iuest cont;ent reduces the need for representa- 
tive sampling of tasks. (Dne reascin representative sampling of tasks is' 
important in the typical testing program is to give all examinees an 
•e<}ual oppbrtunity- to demonstrate their^^competence. With, the SQT Notice ,^ ~ 
tjist content can 1 be fpcused in areas, such as areas that have, ■ 

high traltting needs or that are relaited to new equipment in the field.. 

The publix: nature/ of SQT content aiso hfelps establish an integrated 
training and testing program based on ^Ciritlytal. job requtretnerits. By 
.selecting test content\^t focuses, on clrl/ical job re qulTements, . draining 
efforta will tend to be^T.rected toward these same requirements. Thus, 
ah' int^gr^ted training and testing system is being developed based on 
job Requirements. \ ' 

long' as; individuals are tes'ted on the specif 1q requirements of 
; the irHjob fly* there Is no advantage to keeping, the best content secret . 
iff. factV If theVtest: is /directly related to perfbrmance on the job, then 
the pr6ficient individual should already know\the test content without 
the benefit of the information contained in a test notice. 

MinimlzinR Effects of Job Assignments, on Test Scores. A problem 
that arises in the typical testing program, where test content is kept 
secret, is that some individuals have special advantagiefs over others. 
One possible advantage is that because of favorable Job assignments, job 
tasks and test content are very closely related for sonfe individuals. 
In the past soldiers who were W9rking outside of their MOS w^e at a 
distinct disadvantage on the test content based on MOS-specif ic job 
tasks. ' The effects of bad assignments are minimized in .the SQT program 
because all MOS soldiers are told specifically what content will .be 
included in the test. The prior knowledge about terft content tends to 
equal^e, opportunities. 

In the past som^ '^sdldiers have had advantages* because they were more 
familiar with the voluminous references given for MOS tests. Some 
soldiers did not have the references available to them, and some even if 
they ;did, had difficulty in identifying the critical information within 
the mass of paper and words. In the Soldier's Manual and SQT. Notices 
the ctitical Information is distilled and made available to a3 1 MOS- 
soldiers. Thus, soldiers with, high verbal fluency or with access to 
specialized information no longer retain such a distinct advantage. 
Since the critical information is made available to all soldiers in a 
form readily understood, the opportunities to acquire competence are 
equally available to all soldlers^i 



Minimizing ^ears About Taking T^sts* 5ome individuals se^ to^Aye a 
Knack for doing vjell on tests, while others seem to freeze >^en confronted 
wlth^ a testing situation- Test wis^eness Is frequently cited as an expla- 
.nation of why some do better than^ expect eid , and test .^anxiety Is ascribed 
as a reason why some do more poorly than expected. Both*^of iih^ise factors 
--test wlsenes^ and test ahxiipty — are undesirable Influe^^s' because they 
distort the meaning, of test scores. In the^j^QT program where everyone ^. 
has an oppoirtunltytp practice for the tesft, tRfe effects qf test wiseness 
and test anxiety minimized, and the scores^ are more likely to reflect 
true levels of competence. ^ „ . . ' 

A factor related to test wiseness and test anxiety Is the threat^ that 
manry soldiers experience when taking tests. The threat may be viewed as 
having both objective and subjective components. A major source of 
objective threat arises from the fact - that SQT3 are used- to help make 
personnel decisibns that affect careers.' Soldiers who do poorly on SQTs 
are likely to be .penalized^ while those who do well are rewarded. The 
test then, understandably, poses a tTlreat to many soldiers, especially 
those who are marginal performers or who are not familiar with testing, 
or who have had negatively conditioning* experiences in school situations. 
Subjective components of threat may;)4rise from a variety pf circumstances, 
such as personal characteristics, prior experience with tests, or f^om a, 
fear of being evaluated. The fear of being evaluated may arise because 
the rules or basis for the evaluation are not explicit . If soldiers have- 
foreknowledge about the tasks they will be evaluated j on, and the means by 
which the evaluation will be cppducted , tlien the. subjective threat m^y 
often be reduced. Prior knowledgeabout test contej/t may equalize oppor- 
tunities for soldiers to deraonstp^ftl their true level of job competence 
by reducing distortion of test scores arising from subjective threat. * 

The public nature also has the general effect of Increasing the 
validity of the tests. By giving all MOS soldiers more of an equal 
opportunity to' prepare for the test, the test scores are more likely to ^'^ 
reflect true. levels of competence. ^ ' 

JOB PERFORMANCE STANDARDS VS. PERSONNEL QUOTAS 

A criterion-referenced job proficiency test consisting of task-based 
tests can be scored in terms of percent of tests correct, which is' a 
direct indicator of . the percentage of job tasks a soldier can perform, 
arid therefore, is a direct measure of level' of job competence. The 
percent of task-based tests correct can be interpreted because standards 
are specified. The distribution of scores is not a relevant cons^.dera- 
tion in interpreting the meani||g of the scores./" 



Norm-referenced proficiency tests, in which items have no meaning in terms of job-related activities, have meaning only in 
terms of percentile-rank scores. The percentage of 'items correct does not convey information because the population of 
items has not been defined precisely. Since such test scores have no external referent, the scores can be interpreted only 
in relation to the group taking that particular set of items. The tendency, based on traditional psychometric theory, is 
to select items oh the basis of their difficulty and correlation with total test score. If itenr^^o not have the desired 
statistical properties, they are eyeleted o^revised until they exhibit the propeV difficulties and correlations with total score. 
Resulting changes in test content, ar>d therefore, the correspondence between test and job content-, are not systerrtatically 
talcen into account. 



Eor each task in an SQT twi) categories of performance are esiab- , 
, llsheS — qualified and not qualified, therefore ,'SQTs provide GO/NOrGO ' 
decisions on task performance. Soldiers either meet these standards or 
they do not. Thp total SQT score is the sum of all scorable units 
passed, which provides continuous scores ranging from all scorable units 
/correct to none, or lOO^^-percent correct to 0 percent correct. 



Current Army policy is that ^e SQT total score scale is divided into 
three categories. The higher passing scor.e, called the Qualification 
Score, determines eligibility for award of the next higher skill level, 
and therefore eligiTiility for promotion. Only persons with the appro- ^ 
priate skill level are eligible for promotion. The QualjLfication Score 
is set at 80 percent of the scorable units, correct. The lower passing 
score/ called the Verif ication JScore , determines eligibility to retain ^ 
'the current skill level; the Verification Score is set at; 60 percent of 
the scorable units correct. Soldiers -with SQT scores below 60 percent 
correct' may bei reclassified to another MOS. 

. . ' ■ / — ■ \ ■ \ . ; ■ : I . ■ 

Rank Ordering and Performance Categories. tf SQT S9ores<are also 
used to rank order soldiers • thep in most cases t"he criterion-referenced 
power qf the tests will be re4.uced or lost entirely. The following cases 
illustrate this point;T,the number of -eli^ibles is a) equal to, b) less 
than, and c) grearter than the qiuotas. , • ^ 

" . ' . • ■ " . ■ ■ ' r \ ' 

a) If the quo ta^^dd number of eligible soldiers are 'the same, then 
the decisions of whether to promote j based on the hurdle, and when to 
promote, based on rank ^rder, have the same boundaries and there* is no . 
conflict between ^uptas ;Und standards. 

b) If the number of ellgibles is less th^n the quota and the stand- 
ards are waived until the quotas are met , then the rank ordering, would be 
used to decide both whether and when to promote. Waiving standard^ could 
be equivalent to rank ordering- If the standards are^waived one unit at 
a time until the quotas are^ 'satisfied , then the effect is to rank order 

'with no regard to prerequisites. The waiving could be done in larger 
units, say from 80 correct to 60 correct, and then making the decision of 
when to promote on the basis of other factors. IJg/^f^bhe waiving is accom- 
plished*^and how the tradeoff between standards and qut^tas is achieved, 
are policy decisions. Waiving standards forces an explicit decision 
about the tradeoff, whereas the pure rank ordering approach ignores any 
consideration of standards* On the other hand, if standards are not 
waived, then the rank ordering would be used only to decide when to 
promote. In this case the quotas would be walvepl in favor of inci-easied 
quality. 

c) If the number of eligibles is greater than the quota, then 
depending on how the pool of eligibles becomes replenished, the prerequi- 
site standards may have varied meaning*- If the pool of eligibles is 
always larger than the quota, then some soldiers near the cutting score 
may not be reached and consequently not promoted* If the pool is 
exhausted before new soldiers are added,, then these soldiers are assured 
eventual promotion, and new soldiers who become eligible are placed into 



a hold ca?tegory until the original popl Is exhausted. If the new eli- 
gible soldlerd are liuraedlately added to the pool, then there Is no 
assurance that the remaining eligible soldiers, from the original pool 
wll] be promoted even thoug|i they surpa^ssed; the prerequisite standards. 

The main point about hurdles vs. rank ordering is that the criterion- 
referenced standards may be lost to the rank order unless explicit 
decisions are made to retain the standards.^ Rank ^ordering , lends itself 
so easily * to satisfying quotas that performance standards^ may be readily 
bypassed. The ability to obtain objective standards of jobv performance 
has profound Impact on how personnel decisions can -be made. Personnel 
managers have a chplce between using a priori derived standardis, 
independent of the population taking the te^t , and using quotas . derived 
independent of the content of the test. The traditional solution to 
personnel decisions is to establish quotas, and then to select individ- 
uals until the quotas are satisfied. 



:cordlng to the criterion-referenced test model, levels of perform- 
"ance within a proficiency category Are not discriminated because the 

criterion levels. are the only points of interest. Continuous scores are 
' available , however , *,and they can be used for raxik ordering soldiers. 
Because SQTs can be score.d either in terms of performance categories or 
as continuous scores, explicit decisions can be made about which* 
methods or combination of methods to use, and how the scores will be 
; used in personnel declslon3* 

As. a minimum, SQTs are used, to set'prerequisltes for promotion. As 
described* above^, the prerequisite score is waived to meet quotas if such 
a policy decision is made. An immediate question is whether SQT scores 
should be used to rank order the pool of soldiers eligible for promotion. 
To oveirsimp^fy the question: iSQTs are now used to determ^.ne whether to 
promote." The question of when to promote can also be answered on the 
basis of SQT scores, or can be based on other factors. (Other factors., 
besides SQT scores do affect promotability , but the oversimplified 
version puts the issue in stark relief.) A discussion of how SQT. scores 
can be combined with other factors is presented later in this section. ' 



AC 



An unfortunate consequence of using 'quotas is that performance stan- 
dards, which may be used in delineating a quota limit for one particular 
point In tlhfie, may not be entirely relevant when applied in another 
situation. If, fo^ e^fample, the top 50 percent in a job is eligible for . 
promotion, the job performance of the eligible group will vary as the 
soldiers change over the years, or as the effectiveness of the training 
programs change, or as the relationship between test content and job 
.requirements change over time. J 

Quality vs. Quantity in Personnel Decisions* A major breakthrough 
resulting from criterion-referenced SQTs is the availability of objective 
information about job competence that can be included in making personnel 
decisions. Level of job performance measured by these tests provides an 
absolute indication of proficiency that remains relatively Constant as 
long as jobs remain defined by existing Soldier's Manuals i Performance 
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' standards for . personnel decisions can be specified in terms-of the 
percentage of job tasks soldierfe can perform. These standards are exter- 
nal to the test, and therefore more powerful statements can be made 
about the groups that are eligible to be selected in of out. 

CJubtas for personnel- actions, such as promotion or attendance, at a 
/"school, are likely to reipaitt: a driving force for personnel management in 
the foreseeable future. Rarely, if ever, will the number of Soldiers 
eligible for a personnel action, based on performance standards, be the 
same as the required quotha. > Some adjustment to the quotas or performance 
standards, or both, generally will be, required. If quotas are given top 
priority, then standards are waived; conversely, if performance is given 
top priority, then quotas- are waived. If both quotas and performance 
are waived, say within some pce-established bounds, then, a tradeoff 
between quality andk quantity can be established. - 

Decision rules about quality vs. quantity can be explicitly stated. 
If performance standards are' waived, there is accost in, terms of lowered 

' Individ ua; performance (quality) in order to obtain sufficient numbers \ 
(quantity). If quotas are waived, there is a afein in individual perform- 
ance (quality), but Insufficient numbers (quan(tity) are obtained. By , 
assigning values to units of performance and ^hortfalls, the tradeoff 

' between .quantity and quality can be calculated. Again, the tests do not 
dictate policy about quantity or quality,' but? they support decision rules 
and permit operations not possible without them. 

Wei ghting Factors in Personnel Decisions. The situation becomes more 
complex wheh one does not" base personnel decisions exclusively on test 
scores, tut rather uses test Scores as one factor in a composite score. 
Army personnel actions generally have been ba^ed on a composite scor^, 
which ir^characterized as the whole-man concept. The composites may be 
governed by explicit ^rules to provide objective indices, -or the variables 
may be combined in a ^ubj ective manner by the decisionmakers. An example 
of explicit rules governing the combination of factors is Enlisted 
Evaluation Scores based on a weighting of MOS test scores and Enlisted 
Evaluation Report scores; another example is the determination of 
whether a soldier meets the prerequ^isites for. a particular job training 
cours^, in which aptitude area, scores, physical profile, and perhaps 
prior -training may be considered. An example of subiective combination 
of factors is the process followed by a typical selection board that 
interviews soldiers, examines their records, and then arrives at a _ 
collective decision. 

Criterion-referenced standards require the use of- explicit rules for 
setting the minimum levels of qualification. If the process of combining . 
scores for the qualified group is objective, explicit weights are 
assigned to each variable, and the. contribution of each variable to the 
component score can be .specified. _ . ^ 

■* ■ ' ' ' •- • 

* ' ■ . 



,ERIC 



- 16 - 

ZG 



The assigned weights and the actual weights may or inay not be the. 
^ tte. » ;Tt)ie actual i^^ largely by the^^ ' 

variability or raiige of scores f^r that factor* If the range Is small, ' 
the; effect Is to add a vlrtu^^l constant value to each IndW 
score , regardless of ^assigned welgjhtt , and the small differences cdn have 
. only ^9 smarll ^fect on the final* rank ordering of the soldiers* . If the 
.combining Is bijSed^on su^ 5udgmeht , then the welgliting of the 

-variables cantibt be explicated* ^n jel-ther case, an Important/ cons Idera-^ 
tlon Is how tHe' minimum qualifications are treated in determining ellgl- 
,bll±ty;'for a personnel action. If the sta»derds^P serve fo categorize 
solii|pef s 'into qualified ^nd nbn-quallfie^ groups and the qualified groups 
is then given the favorable treattj^^t^wlfile the non-*qualif led group* is 
excluded from consideration, then .-^he c/rl^erion-^ standards are 

bperativ€^i If , . however , the minimum stafidards can be waived, then ; the 
^uB'jective process may easily ignore , the ^.standards, and th6 "net. effect 
may be to lose the power that Inheres in criterion-^referenced sf^gdards. 



on successive 



The prdcess4 of combining scor^es may also be bas^d: 
hurdles. The use of successive hurdles for combining scores* virtually , 
iassur^d thkt st^andardis'wlll be'malntaln^^^^ Estab'llshjnent of the minimum 
levels of* qualifications requires explicit decisions, and any' waiving 
then mu0t also be explicit* An example of multiple hurdles is the 
determination of eligibility for en^trance in a job training course. A 
minimum aptitude area scorers set, usually at 90, and^ther minimum 
prerequisites, may also be Included in the decision, such as physical 
profiles, prior military job training, ^nd high school courses completed. 
I^'t all eligible persons enter a course, but unquallf i^^persbns are 
idfxcluded utiles^ a specific waive^^is. applied. The use- of hurdles is 
compatible wi£n criterion-referenced standards. ^ 



* SQTs, because of their- criterion-referenced properties, permit basing 
personnel decisions on objective performance standards. As has been men-> 
tloned, technical feasibility does not. necessarily dictate policy, and 
therefore personnel decisions need not" necessarily be based on perform- 
aiice standards. However^, since the possibility exists, rational .evalu-- 
atlon of the costs and benefits In changing tb new petsonnel policies can 
now be accomplished by decisionmakers. - : 



CONCLUSIONS 



Two themes have pervaded the discussion of criterion-referenced Skill 
Qualification Tests: 1) test content* is bas^d on systematic analysis of 
^ob requirements; 2) *SQTs provide new opportunities for .training managers 
personnel managers, and research personnel to reassess and redefine their 
functions. - 

SQTs provide new information about^evels of job performance not 
previously .available from traditional. proficiency and achievement tests. 
However, the power inherent in this information would be lost unless 
explicit use is made of the criterion-referenced performance data 
available from SQT -scores. 

* - • * 
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For training managers and job supervisors, feedbiack from SQTs can be 
iised . to structure individual izedv training programs based on critical job. 
tasks. Instead of basing training requirement's oii global evaluat ion? of 
performance, training programs^an be based 6n specific job tasks that ' 
are critical to both unit miss iqn and individual job requirements. 

iPersonnel managers- have tiesppnsibiLities for defining j^b specialties 
J^nd'-tor matching individirals. and jobs . Under traditional procedures , 
job^'haye tended to be defined in general If erms of fuacti^i^is, skills, 
and knowledges . Similarly, individual qualifications have also* been^ 
assessed in global terms, euch as total MOS proficiency score,' training 
courses_completed, or time in grade . With the technology underlying the 
SQT. program, . and afl of mb4erBh. insffuctioiial tech-nOlogy, both job 
requirements and individual qualifications can be. stated more precisely 
--critical job tasks define j ob requirements . . and^perf ormance on these , 
critical tasks defines levels of proficiency. ' ; ' 

iF^inally^ r^earch persqtinel may h«ve to reconcept^al ize their 
funQtipn.r TraditionaUx* J^s^VP^y^'^^^^l^^^^ have' foqused their efforts 
on developing statisStical techniques fo^ improving the accuracy of t6st 
scores. However, ^in criterion-referenced testing , establishing the^ 
content of a test is prerequisite to, and therefore, perhaps even more 
important than improving the accuracy of test scores. Th,e interpretation 
of test scores in cr iterion-jreferenced tefeti^ng is' always depenj^ent on 
.being able to provide an explicit linkagebetween test content and test 
scores. Research • effort s are required tlvat^ explQjf a; and define the 
relationship between test cpnterit and^.test scores.; For example, there is^ 
a need for research on development df score scales des ighed to reflect 
realistic standards of performance. 

Because o^the need to establish /an operational test ifig program to 
meet a tig'ht schedule, some decision^ were made that appear reasonable 
but are not supported by an existing test theory . One example of such a. 
decision is how to match scores frofa different tests., SQTs are assumed 
to be of equal difficulty and releVance to^all job incumbents, which is ^ 
most rea5onable assumption given the current state of the art. New 
theoretical developmejll^ are required to develop score scales that can 
equate scores of soldiers tested on different tasks. A promising 
approach is available in latent tr^it theory, which addresses many of the 
pr^oblems faced ift developing SQTs. The applicability . of latent trait 
» theory , h|)wever, has" not yet been suf f ici^ent 1^ demonstrated in any 
large-sc^le ^testing program, especially one confronted With the limited 
resources availaible to test development 'activities. • ^ 
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